We propose `Hide-and-Seek', a weakly-supervised framework that aims toimprove object localization in images and action localization in videos. Mostexisting weakly-supervised methods localize only the most discriminative partsof an object rather than all relevant parts, which leads to suboptimalperformance. Our key idea is to hide patches in a training image randomly,forcing the network to seek other relevant parts when the most discriminativepart is hidden. Our approach only needs to modify the input image and can workwith any network designed for object localization. During testing, we do notneed to hide any patches. Our Hide-and-Seek approach obtains superiorperformance compared to previous methods for weakly-supervised objectlocalization on the ILSVRC dataset. We also demonstrate that our framework canbe easily extended to weakly-supervised action localization.
展开▼